• Grok-1.5V or Grok 1.5 "Vision" is xAI's first-generation multimodal model. The bot will be able to respond to uploaded pictures and screenshots and reason through complex documents, science diagrams, charts, screenshots, and photographs. It will also have real-world spatial understanding to better understand the physical world depicted in the images uploaded by its users.

    Monday, April 15, 2024
  • Based on UserZoom capabilities, UserTesting has created a new Feedback Engine, which combines AI with generative capabilities to help better understand feedback from user surveys. The company is also expanding its AI-powered insights capability to provide a deeper analysis of trends from user testing operations by enabling AI-powered surveys. AI-powered theme generation will allow researchers to understand test results better.

  • Apple introduced Apple Intelligence at WWDC24, a personal, private intelligence system powered by generative AI models. The on-device and server-based foundation models are fast and efficient thanks to techniques like low-bit palletization and adapters for specific tasks. Apple evaluated these models through human evaluation and various benchmarks.

  • Rabbit is set to release a significant update for its r1 device on October 1, 2024, which will introduce a web-based version of its Large Action Model (LAM) agent. This announcement comes after a challenging initial launch period where the company faced criticism for not meeting its ambitious promises. CEO Jesse Lyu acknowledged that expectations were set too high from the start but expressed optimism that the upcoming update would enhance the device's capabilities significantly. The r1, which gained attention as a must-have gadget in early 2024, has undergone numerous updates aimed at improving its functionality. However, it has primarily been limited to interacting with a few specific services, such as Uber and Spotify. The new version of the LAM aims to be more versatile, allowing users to perform a wide range of tasks across various websites, such as purchasing tickets or registering domains. Lyu emphasized that this update would enable the r1 to support virtually any action that can be completed on a website. During a demonstration, Lyu showcased the agent's ability to break down tasks into manageable steps and execute them by analyzing the elements on a webpage. For instance, when tasked with registering a domain for a film festival, the agent successfully navigated through Google to find a suitable domain registry and completed the purchase. The agent operates in a cloud-based browser, with plans for local versions, such as a Chrome extension, to enhance user experience and security. Despite the promising features, Lyu noted that the agent is still in a developmental phase, requiring careful prompt engineering to achieve desired outcomes. The model is capable of planning but lacks the ability to skip unnecessary steps, which could hinder user experience. Additionally, user data will not be harvested for model improvement at this stage, although a "teach mode" is planned for future updates to allow users to guide the agent in performing specific tasks. Rabbit's approach is to create a cross-platform AI agent that can operate independently of existing applications, which Lyu argues is a more sustainable model than developing a standalone app. He believes that building an app would limit the agent's potential and create conflicts with major platforms like Apple and Google. The company envisions expanding the agent's capabilities beyond web interactions to include desktop applications and potentially mobile devices in the future. As the launch date approaches, Lyu has tempered expectations, reminding users that while the new model represents significant progress, it is not without limitations. The update is anticipated to provide a more functional and versatile tool for users, marking a step forward in Rabbit's mission to create a comprehensive AI assistant.